Student’s t distribution (t)#
The Student’s t distribution is a continuous distribution on (\mathbb{R}) that looks normal near the center but has heavier tails.
It shows up in two complementary ways:
as the sampling distribution of the t-statistic when the variance is unknown
as a practical robust noise model (a heavy-tailed alternative to a Gaussian)
Learning goals#
By the end, you should be able to:
write down the PDF/CDF and understand the role of degrees of freedom
explain where the t distribution comes from (normal + chi-square)
compute key properties (when moments exist, entropy, characteristic function)
sample from it using NumPy only
use
scipy.stats.tfor evaluation, simulation, and fitting
import platform
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
import scipy
from scipy import optimize, special, stats
from scipy.stats import norm
from scipy.stats import t as t_dist
# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
# Reproducibility
rng = np.random.default_rng(42)
np.set_printoptions(precision=6, suppress=True)
VERSIONS = {
"python": platform.python_version(),
"numpy": np.__version__,
"scipy": scipy.__version__,
"plotly": plotly.__version__,
}
VERSIONS
{'python': '3.12.9', 'numpy': '1.26.2', 'scipy': '1.15.0', 'plotly': '6.5.2'}
1) Title & Classification#
Name:
t(Student’s t distribution; SciPy:scipy.stats.t)Type: Continuous
Support (standard form): (x \in (-\infty, \infty))
Parameter space (standard form): degrees of freedom (\nu > 0)
SciPy location/scale:
loc \in \mathbb{R},scale > 0with
[ X = \mathrm{loc} + \mathrm{scale},T, \qquad T \sim t_{\nu}. ]
Unless stated otherwise, we work with the standard t distribution (loc=0, scale=1).
2) Intuition & Motivation#
What it models#
A t distribution behaves like a normal distribution near its center, but its tails decay only polynomially. This makes it a good model for occasional large deviations.
A canonical generative story is:
Draw a standard normal (Z \sim \mathcal{N}(0,1))
Draw an independent chi-square (V \sim \chi^2_{\nu})
Form the ratio
[ T = \frac{Z}{\sqrt{V/\nu}} \sim t_{\nu}. ]
The random denominator inflates or deflates the normal draw, producing heavier tails.
Typical real-world use cases#
Small-sample inference: the t-statistic ((\bar X-\mu_0)/(S/\sqrt{n})) follows a t distribution under normality.
Robust modeling: t likelihoods reduce the influence of outliers (common in finance, sensor data, web metrics).
Bayesian modeling: t priors/likelihoods appear as normal–gamma mixtures; posterior predictive distributions are often t.
Relations to other distributions#
As (\nu \to \infty), (t_{\nu}) converges to (\mathcal{N}(0,1)).
(t_1) is exactly the standard Cauchy distribution.
If (T \sim t_{\nu}), then (T^2 \sim F_{1,\nu}) (F distribution).
The t distribution is a scale mixture of normals: if (\lambda \sim \mathrm{Gamma}(\nu/2,\ \nu/2)) (shape–rate), then
[ T\mid\lambda \sim \mathcal{N}(0, 1/\lambda), \quad\Rightarrow\quad T \sim t_{\nu}. ]
3) Formal Definition#
Let (T \sim t_{\nu}) be a standard Student’s t random variable with degrees of freedom (\nu>0).
PDF#
[ f(t;\nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi},\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}, \qquad t \in \mathbb{R}. ]
For the location–scale family (X = \mathrm{loc} + \mathrm{scale},T) with (\mathrm{scale}>0), the density is
[ f_X(x;\nu,\mathrm{loc},\mathrm{scale}) = \frac{1}{\mathrm{scale}},f!\left(\frac{x-\mathrm{loc}}{\mathrm{scale}};\nu\right). ]
CDF#
The CDF can be written using the regularized incomplete beta function (I_x(a,b)). Let
[ w(t) = \frac{\nu}{\nu + t^2} \in (0,1]. ]
Then for (t\ge 0):
[ F(t;\nu) = 1 - \tfrac{1}{2} I_{w(t)}!\left(\tfrac{\nu}{2}, \tfrac{1}{2}\right), ]
and by symmetry for (t<0):
[ F(t;\nu) = \tfrac{1}{2} I_{w(t)}!\left(\tfrac{\nu}{2}, \tfrac{1}{2}\right). ]
def _validate_df_loc_scale(df: float, loc: float, scale: float) -> tuple[float, float, float]:
df = float(df)
loc = float(loc)
scale = float(scale)
if not (df > 0.0):
raise ValueError("df (ν) must be > 0")
if not (scale > 0.0):
raise ValueError("scale must be > 0")
return df, loc, scale
def t_logpdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Log-PDF of the Student t distribution with df, loc, scale.
Uses stable log-gamma expressions.
'''
df, loc, scale = _validate_df_loc_scale(df, loc, scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
log_norm = (
special.gammaln((df + 1.0) / 2.0)
- special.gammaln(df / 2.0)
- 0.5 * np.log(df * np.pi)
- np.log(scale)
)
return log_norm - ((df + 1.0) / 2.0) * np.log1p((z * z) / df)
def t_pdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
return np.exp(t_logpdf(x, df=df, loc=loc, scale=scale))
def t_cdf(x: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''CDF via the regularized incomplete beta representation.'''
df, loc, scale = _validate_df_loc_scale(df, loc, scale)
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
w = df / (df + z * z)
ib = special.betainc(df / 2.0, 0.5, w)
return np.where(z >= 0.0, 1.0 - 0.5 * ib, 0.5 * ib)
# Sanity checks against SciPy
df = 7.5
loc = -0.3
scale = 1.7
x = np.linspace(-6, 6, 501)
assert np.allclose(t_pdf(x, df, loc=loc, scale=scale), t_dist.pdf(x, df, loc=loc, scale=scale), rtol=1e-10, atol=0)
assert np.allclose(t_cdf(x, df, loc=loc, scale=scale), t_dist.cdf(x, df, loc=loc, scale=scale), rtol=1e-10, atol=0)
"ok"
'ok'
4) Moments & Properties#
Mean / variance / skewness / kurtosis#
Let (T\sim t_{\nu}) (standard, centered).
Mean: (\mathbb{E}[T]=0) if (\nu>1); undefined for (\nu\le 1).
Variance: (\mathrm{Var}(T)=\dfrac{\nu}{\nu-2}) if (\nu>2); infinite for (1<\nu\le 2).
Skewness: 0 if (\nu>3); undefined otherwise.
Excess kurtosis: (\dfrac{6}{\nu-4}) if (\nu>4); infinite for (2<\nu\le 4).
For the location–scale family (X = \mathrm{loc} + \mathrm{scale},T), when the mean/variance exist:
[ \mathbb{E}[X]=\mathrm{loc}, \qquad \mathrm{Var}(X)=\mathrm{scale}^2,\frac{\nu}{\nu-2}. ]
MGF and characteristic function#
The moment generating function (M_T(t)=\mathbb{E}[e^{tT}]) does not exist (is infinite) for any nonzero (t). Intuition: t tails behave like (|x|^{-(\nu+1)}), and (e^{t x}) dominates any polynomial as (x\to\infty).
The characteristic function exists for all (t) and can be written using the modified Bessel function (K_{\alpha}):
[ \varphi_T(t) = \mathbb{E}[e^{i t T}] = \frac{(\sqrt{\nu}|t|)^{\nu/2},K_{\nu/2}(\sqrt{\nu}|t|)}{2^{\nu/2-1},\Gamma(\nu/2)}. ]
Entropy#
The differential entropy of the standard t distribution is
[ H(T) = \log\bigl(\sqrt{\nu},B(\nu/2, 1/2)\bigr)
\frac{\nu+1}{2}\left[\psi\left(\frac{\nu+1}{2}\right) - \psi\left(\frac{\nu}{2}\right)\right], ]
where (B) is the beta function and (\psi) is the digamma function. For (X=\mathrm{loc}+\mathrm{scale},T),
[ H(X)=H(T)+\log(\mathrm{scale}). ]
def t_moments(df: float, loc: float = 0.0, scale: float = 1.0) -> dict:
df, loc, scale = _validate_df_loc_scale(df, loc, scale)
mean = np.nan
var = np.nan
skew = np.nan
excess_kurt = np.nan
if df > 1:
mean = loc
if df > 2:
var = (scale * scale) * df / (df - 2.0)
elif df > 1:
var = np.inf
if df > 3:
skew = 0.0
if df > 4:
excess_kurt = 6.0 / (df - 4.0)
elif df > 2:
excess_kurt = np.inf
return {
"df": df,
"loc": loc,
"scale": scale,
"mean": mean,
"var": var,
"skew": skew,
"excess_kurt": excess_kurt,
"kurt": excess_kurt + 3.0 if np.isfinite(excess_kurt) else excess_kurt,
}
def t_entropy(df: float, scale: float = 1.0) -> float:
df, _, scale = _validate_df_loc_scale(df, 0.0, scale)
a = df / 2.0
term1 = 0.5 * np.log(df) + special.betaln(a, 0.5)
term2 = (df + 1.0) / 2.0 * (special.digamma((df + 1.0) / 2.0) - special.digamma(a))
return float(term1 + term2 + np.log(scale))
def t_charfun(t: np.ndarray, df: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Characteristic function of Student-t (df, loc, scale).'''
df, loc, scale = _validate_df_loc_scale(df, loc, scale)
t = np.asarray(t, dtype=float)
a = df / 2.0
x = np.sqrt(df) * np.abs(scale * t)
out = np.empty_like(x, dtype=float)
mask0 = x == 0.0
out[mask0] = 1.0
xm = x[~mask0]
log_const = -(a - 1.0) * np.log(2.0) - special.gammaln(a)
out[~mask0] = np.exp(log_const + a * np.log(xm) + np.log(special.kv(a, xm)))
return np.exp(1j * loc * t) * out
df_demo = 5
{
"moments": t_moments(df_demo),
"entropy": t_entropy(df_demo),
}
{'moments': {'df': 5.0,
'loc': 0.0,
'scale': 1.0,
'mean': 0.0,
'var': 1.6666666666666667,
'skew': 0.0,
'excess_kurt': 6.0,
'kurt': 9.0},
'entropy': 1.627502672414396}
# Characteristic function: closed form vs Monte Carlo estimate (memory-safe)
df = 5
n = 80_000
samples = t_dist.rvs(df, size=n, random_state=rng)
t_grid = np.linspace(0, 15, 300)
phi_formula = np.real(t_charfun(t_grid, df=df))
phi_mc = np.array([np.real(np.mean(np.exp(1j * t0 * samples))) for t0 in t_grid])
fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=phi_formula, mode="lines", name="closed form"))
fig.add_trace(go.Scatter(x=t_grid, y=phi_mc, mode="lines", name="Monte Carlo", line=dict(dash="dot")))
fig.update_layout(
title="Characteristic function of t distribution (df=5)",
xaxis_title="t",
yaxis_title="Re φ(t)",
width=900,
height=420,
)
fig
5) Parameter Interpretation#
The t distribution has three knobs in SciPy: (df, loc, scale).
df((\nu), degrees of freedom) controls tail heaviness.small (\nu) ⇒ very heavy tails and unstable sample means
(\nu\to\infty) ⇒ approaches the standard normal
locshifts the distribution (it is the center of symmetry, and the mean when it exists).scalestretches the distribution.
A practical way to compare tails is via the decay rate:
[ f(t;\nu) \propto |t|^{-(\nu+1)}\quad\text{for large }|t|. ]
So each additional degree of freedom makes the tail a little lighter.
x = np.linspace(-6, 6, 2000)
dfs = [1, 2, 5, 10, 30]
fig = go.Figure()
for df in dfs:
fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name=f"t(df={df})"))
fig.add_trace(go.Scatter(x=x, y=norm.pdf(x), mode="lines", name="N(0,1)", line=dict(color="black", dash="dash")))
fig.update_layout(
title="PDF shape changes: degrees of freedom ν",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
# Effect of loc and scale
x = np.linspace(-10, 10, 2500)
params = [
(5, 0.0, 1.0),
(5, 2.0, 1.0),
(5, 0.0, 2.0),
(30, 0.0, 1.0),
]
fig = go.Figure()
for df, loc, scale in params:
fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df, loc=loc, scale=scale), mode="lines", name=f"df={df}, loc={loc}, scale={scale}"))
fig.update_layout(
title="Location/scale effects",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
6) Derivations#
6.1 Expectation#
For the standard t distribution, the PDF is symmetric: (f(t;\nu)=f(-t;\nu)). So the integral of (t f(t;\nu)) over symmetric limits is 0.
However, a mean exists only if the integral of (|t| f(t;\nu)) is finite. Because the tails behave like (|t|^{-(\nu+1)}),
[ \int^{\infty} |t| f(t;\nu),dt\ \text{converges iff}\ \nu>1. ]
Thus:
if (\nu>1), (\mathbb{E}[T]=0)
if (\nu\le 1), the mean is undefined
6.2 Variance#
Use the ratio representation (T = Z/\sqrt{V/\nu}) with (Z\sim\mathcal{N}(0,1)), (V\sim\chi^2_{\nu}), independent. Then
[ T^2 = \frac{Z^2}{V/\nu} = \nu,\frac{Z^2}{V}. ]
If (\nu>2), the variance exists and
[ \mathbb{E}[T^2] = \nu,\mathbb{E}[Z^2] \mathbb{E}[1/V] = \nu\cdot 1 \cdot \frac{1}{\nu-2} = \frac{\nu}{\nu-2}. ]
(The identity (\mathbb{E}[1/V]=1/(\nu-2)) holds for (V\sim\chi^2_{\nu}) and (\nu>2).)
6.3 Likelihood#
Given observations (x_1,\dots,x_n) modeled as i.i.d.
[ X_i \sim t_{\nu}(\mathrm{loc},\mathrm{scale}), ]
the log-likelihood is
[ \ell(\nu,\mathrm{loc},\mathrm{scale}) = \sum_{i=1}^n \log f_X(x_i;\nu,\mathrm{loc},\mathrm{scale}), ]
with
[ \log f_X(x_i) = \log\Gamma\left(\frac{\nu+1}{2}\right)
\log\Gamma\left(\frac{\nu}{2}\right)
\tfrac12\log(\nu\pi)
\log(\mathrm{scale})
\frac{\nu+1}{2}\log\left(1 + \frac{1}{\nu}\left(\frac{x_i-\mathrm{loc}}{\mathrm{scale}}\right)^2\right). ]
This is what MLE routines optimize (including scipy.stats.t.fit).
def t_loglik(x: np.ndarray, df: float, loc: float, scale: float) -> float:
return float(np.sum(t_logpdf(x, df=df, loc=loc, scale=scale)))
def t_nll(theta: np.ndarray, x: np.ndarray) -> float:
df, loc, log_scale = float(theta[0]), float(theta[1]), float(theta[2])
scale = float(np.exp(log_scale))
# Penalize invalid df directly (optimizer-friendly).
if df <= 0:
return np.inf
return -t_loglik(x, df=df, loc=loc, scale=scale)
# Quick demo: optimize (df, loc, scale) on simulated data
true = {"df": 6.0, "loc": 1.5, "scale": 0.8}
x = t_dist.rvs(true["df"], loc=true["loc"], scale=true["scale"], size=3_000, random_state=rng)
theta0 = np.array([10.0, np.median(x), np.log(np.std(x) + 1e-9)])
res = optimize.minimize(t_nll, theta0, args=(x,), method="Nelder-Mead")
df_hat, loc_hat, scale_hat = float(res.x[0]), float(res.x[1]), float(np.exp(res.x[2]))
{ "true": true, "mle_hat": {"df": df_hat, "loc": loc_hat, "scale": scale_hat} }
{'true': {'df': 6.0, 'loc': 1.5, 'scale': 0.8},
'mle_hat': {'df': 5.6380083522442455,
'loc': 1.4939574148670296,
'scale': 0.7853995473987132}}
7) Sampling & Simulation (NumPy-only)#
Algorithm#
Use the defining ratio:
[ T = \frac{Z}{\sqrt{V/\nu}}, \qquad Z\sim\mathcal{N}(0,1),\ \ V\sim\chi^2_{\nu},\ \ Z\perp V. ]
To sample (V \sim \chi^2_{\nu}) for general (non-integer) (\nu), use the gamma representation:
[ V \sim \chi^2_{\nu} \iff V \sim \mathrm{Gamma}(k=\nu/2,\ \theta=2) ]
(shape (k), scale (\theta)).
Then return (X=\mathrm{loc}+\mathrm{scale},T).
def t_rvs_numpy(
df: float,
loc: float = 0.0,
scale: float = 1.0,
size: int | tuple[int, ...] = 1,
rng: np.random.Generator | None = None,
) -> np.ndarray:
'''NumPy-only sampler for Student-t via Normal / Chi-square ratio.'''
df, loc, scale = _validate_df_loc_scale(df, loc, scale)
rng = np.random.default_rng() if rng is None else rng
z = rng.standard_normal(size=size)
v = rng.gamma(shape=df / 2.0, scale=2.0, size=size) # Chi-square(df)
t = z / np.sqrt(v / df)
return loc + scale * t
# Monte Carlo check
df = 8
n = 400_000
samples = t_rvs_numpy(df, size=n, rng=rng)
mc_mean = float(np.mean(samples))
mc_var = float(np.var(samples))
theory = t_moments(df)
(mc_mean, theory["mean"], mc_var, theory["var"])
(-0.0026917757260773163, 0.0, 1.3319992347178284, 1.3333333333333333)
8) Visualization#
We’ll visualize:
PDFs for multiple degrees of freedom (tail heaviness)
CDFs (how quickly probability accumulates)
Monte Carlo samples vs the theoretical PDF
# PDF + CDF side-by-side
x = np.linspace(-6, 6, 2000)
dfs = [1, 2, 5, 10, 30]
fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))
for df in dfs:
fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name=f"df={df}"), row=1, col=1)
fig.add_trace(go.Scatter(x=x, y=t_cdf(x, df=df), mode="lines", name=f"df={df}", showlegend=False), row=1, col=2)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="F(x)", row=1, col=2)
fig.update_layout(title="Student-t: PDF and CDF for different ν", width=1000, height=420)
fig
# Monte Carlo histogram vs theoretical PDF
df = 4
n = 200_000
samples = t_rvs_numpy(df, size=n, rng=rng)
x = np.linspace(-8, 8, 2500)
fig = go.Figure()
fig.add_trace(go.Histogram(x=samples, nbinsx=120, histnorm="probability density", name="samples", opacity=0.55))
fig.add_trace(go.Scatter(x=x, y=t_pdf(x, df=df), mode="lines", name="theoretical PDF", line=dict(color="black")))
fig.update_layout(
title=f"Monte Carlo samples vs PDF (df={df})",
xaxis_title="x",
yaxis_title="density",
width=900,
height=420,
)
fig
9) SciPy Integration (scipy.stats.t)#
SciPy provides a full implementation of the t distribution:
t_dist.pdf(x, df, loc=..., scale=...)t_dist.cdf(x, df, loc=..., scale=...)t_dist.rvs(df, loc=..., scale=..., size=..., random_state=...)t_dist.fit(data)for MLE estimation of(df, loc, scale)
We’ll use it both as a reference implementation and for fitting.
df, loc, scale = 6.0, 1.2, 0.7
x = np.array([-2.0, -1.0, 0.0, 1.0, 2.0])
pdf_vals = t_dist.pdf(x, df, loc=loc, scale=scale)
cdf_vals = t_dist.cdf(x, df, loc=loc, scale=scale)
samples = t_dist.rvs(df, loc=loc, scale=scale, size=20_000, random_state=rng)
fit_df, fit_loc, fit_scale = t_dist.fit(samples)
{
"pdf": pdf_vals,
"cdf": cdf_vals,
"fit": {"df": float(fit_df), "loc": float(fit_loc), "scale": float(fit_scale)},
}
{'pdf': array([0.002866, 0.018138, 0.135473, 0.521502, 0.274424]),
'cdf': array([0.001902, 0.009998, 0.068652, 0.392352, 0.851679]),
'fit': {'df': 5.833234442641876,
'loc': 1.198612165986697,
'scale': 0.7063928901680383}}
10) Statistical Use Cases#
10.1 Hypothesis testing (why the t distribution exists)#
If (X_1,\dots,X_n) are i.i.d. (\mathcal{N}(\mu,\sigma^2)), then the t statistic
[ T = \frac{\bar X - \mu}{S/\sqrt{n}} ]
follows (t_{n-1}), where (S) is the sample standard deviation. That is the mathematical justification for t tests and t confidence intervals.
10.2 Bayesian modeling#
Two common appearances:
Posterior predictive: with a Normal–Inverse-Gamma prior for ((\mu,\sigma^2)), the posterior predictive distribution for a new observation is Student-t.
Robust likelihood: using a t likelihood instead of Gaussian makes Bayesian inference less sensitive to outliers.
10.3 Generative modeling#
A Student-t noise model is a simple way to generate data with occasional big shocks while remaining symmetric and unimodal.
# Simulate the t-statistic under the null and compare to t_{n-1}
n = 12
n_rep = 200_000
mu0 = 0.0
sigma = 2.0
x = rng.normal(loc=mu0, scale=sigma, size=(n_rep, n))
x_bar = x.mean(axis=1)
s = x.std(axis=1, ddof=1)
t_stat = (x_bar - mu0) / (s / np.sqrt(n))
df = n - 1
grid = np.linspace(-6, 6, 2000)
fig = go.Figure()
fig.add_trace(go.Histogram(x=t_stat, nbinsx=140, histnorm="probability density", name="simulated t-stat", opacity=0.55))
fig.add_trace(go.Scatter(x=grid, y=t_dist.pdf(grid, df), mode="lines", name=f"t(df={df})", line=dict(color="black")))
fig.update_layout(
title=f"t-statistic distribution under Normality (n={n} ⇒ df={df})",
xaxis_title="t-stat",
yaxis_title="density",
width=900,
height=420,
)
fig
# Bayesian posterior predictive (Normal data, unknown mean & variance)
# Prior: Normal-Inverse-Gamma (μ, σ^2)
# Synthetic data (with a couple outliers to make things interesting)
rng2 = np.random.default_rng(123)
y = rng2.normal(loc=1.0, scale=1.2, size=25)
y[[3, 17]] += np.array([5.0, -4.0])
# Prior hyperparameters (weakly informative)
mu0 = 0.0
kappa0 = 0.01
alpha0 = 2.0
beta0 = 2.0
n = y.size
y_bar = float(y.mean())
ssq = float(np.sum((y - y_bar) ** 2))
kappa_n = kappa0 + n
mu_n = (kappa0 * mu0 + n * y_bar) / kappa_n
alpha_n = alpha0 + n / 2.0
beta_n = beta0 + 0.5 * ssq + 0.5 * (kappa0 * n / kappa_n) * (y_bar - mu0) ** 2
# Posterior predictive is Student-t with:
# df = 2*alpha_n
# loc = mu_n
# scale^2 = beta_n * (kappa_n + 1) / (alpha_n * kappa_n)
df_pred = 2.0 * alpha_n
loc_pred = mu_n
scale_pred = float(np.sqrt(beta_n * (kappa_n + 1.0) / (alpha_n * kappa_n)))
x = np.linspace(loc_pred - 6 * scale_pred, loc_pred + 6 * scale_pred, 2500)
pred_pdf = t_dist.pdf(x, df_pred, loc=loc_pred, scale=scale_pred)
plug_in_pdf = norm.pdf(x, loc=y_bar, scale=float(np.std(y, ddof=1)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=pred_pdf, mode="lines", name=f"posterior predictive t (df={df_pred:.1f})"))
fig.add_trace(go.Scatter(x=x, y=plug_in_pdf, mode="lines", name="plug-in Normal", line=dict(dash="dash")))
fig.add_trace(go.Histogram(x=y, nbinsx=20, histnorm="probability density", name="data", opacity=0.4))
fig.update_layout(
title="Posterior predictive: Student-t vs plug-in Normal",
xaxis_title="y",
yaxis_title="density",
width=900,
height=450,
)
fig
11) Pitfalls#
Invalid parameters:
dfmust be (>0),scalemust be (>0).Nonexistent moments: the mean/variance/kurtosis require ( u>1), ( u>2), ( u>4) respectively. With small ( u), sample means and variances can be extremely unstable.
Numerical issues:
for extreme (|x|) or small ( u), computing the PDF directly can underflow; prefer
logpdf.CDF tails are best computed by special functions (SciPy does this well).
Parameterization confusion: many texts define the t distribution only via ( u), but SciPy uses a location–scale family (
df,loc,scale).
12) Summary#
The Student’s t distribution (t_{\nu}) is a continuous, symmetric, heavy-tailed distribution on (\mathbb{R}).
It arises as (Z/\sqrt{V/\nu}) (normal divided by chi-square scale), explaining its role in t tests.
df=1gives the Cauchy;df\to\inftyapproaches the normal.Moments exist only above thresholds: mean ((\nu>1)), variance ((\nu>2)), kurtosis ((\nu>4)).
A NumPy-only sampler is easy via the ratio representation; SciPy provides a full-featured implementation and MLE fitting.